139 research outputs found
HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods
Talking Face Generation (TFG) aims to reconstruct facial movements to achieve
high natural lip movements from audio and facial features that are under
potential connections. Existing TFG methods have made significant advancements
to produce natural and realistic images. However, most work rarely takes visual
quality into consideration. It is challenging to ensure lip synchronization
while avoiding visual quality degradation in cross-modal generation methods. To
address this issue, we propose a universal High-Definition Teeth Restoration
Network, dubbed HDTR-Net, for arbitrary TFG methods. HDTR-Net can enhance teeth
regions at an extremely fast speed while maintaining synchronization, and
temporal consistency. In particular, we propose a Fine-Grained Feature Fusion
(FGFF) module to effectively capture fine texture feature information around
teeth and surrounding regions, and use these features to fine-grain the feature
map to enhance the clarity of teeth. Extensive experiments show that our method
can be adapted to arbitrary TFG methods without suffering from lip
synchronization and frame coherence. Another advantage of HDTR-Net is its
real-time generation ability. Also under the condition of high-definition
restoration of talking face video synthesis, its inference speed is
faster than the current state-of-the-art face restoration based on
super-resolution.Comment: 15pages, 6 figures, PRCV202
RainDiffusion:When Unsupervised Learning Meets Diffusion Models for Real-world Image Deraining
What will happen when unsupervised learning meets diffusion models for
real-world image deraining? To answer it, we propose RainDiffusion, the first
unsupervised image deraining paradigm based on diffusion models. Beyond the
traditional unsupervised wisdom of image deraining, RainDiffusion introduces
stable training of unpaired real-world data instead of weakly adversarial
training. RainDiffusion consists of two cooperative branches: Non-diffusive
Translation Branch (NTB) and Diffusive Translation Branch (DTB). NTB exploits a
cycle-consistent architecture to bypass the difficulty in unpaired training of
standard diffusion models by generating initial clean/rainy image pairs. DTB
leverages two conditional diffusion modules to progressively refine the desired
output with initial image pairs and diffusive generative prior, to obtain a
better generalization ability of deraining and rain generation. Rain-Diffusion
is a non adversarial training paradigm, serving as a new standard bar for
real-world image deraining. Extensive experiments confirm the superiority of
our RainDiffusion over un/semi-supervised methods and show its competitive
advantages over fully-supervised ones.Comment: 9 page
Don't worry about mistakes! Glass Segmentation Network via Mistake Correction
Recall one time when we were in an unfamiliar mall. We might mistakenly think
that there exists or does not exist a piece of glass in front of us. Such
mistakes will remind us to walk more safely and freely at the same or a similar
place next time. To absorb the human mistake correction wisdom, we propose a
novel glass segmentation network to detect transparent glass, dubbed
GlassSegNet. Motivated by this human behavior, GlassSegNet utilizes two key
stages: the identification stage (IS) and the correction stage (CS). The IS is
designed to simulate the detection procedure of human recognition for
identifying transparent glass by global context and edge information. The CS
then progressively refines the coarse prediction by correcting mistake regions
based on gained experience. Extensive experiments show clear improvements of
our GlassSegNet over thirty-four state-of-the-art methods on three benchmark
datasets
Joint Depth Estimation and Mixture of Rain Removal From a Single Image
Rainy weather significantly deteriorates the visibility of scene objects,
particularly when images are captured through outdoor camera lenses or
windshields. Through careful observation of numerous rainy photos, we have
found that the images are generally affected by various rainwater artifacts
such as raindrops, rain streaks, and rainy haze, which impact the image quality
from both near and far distances, resulting in a complex and intertwined
process of image degradation. However, current deraining techniques are limited
in their ability to address only one or two types of rainwater, which poses a
challenge in removing the mixture of rain (MOR). In this study, we propose an
effective image deraining paradigm for Mixture of rain REmoval, called
DEMore-Net, which takes full account of the MOR effect. Going beyond the
existing deraining wisdom, DEMore-Net is a joint learning paradigm that
integrates depth estimation and MOR removal tasks to achieve superior rain
removal. The depth information can offer additional meaningful guidance
information based on distance, thus better helping DEMore-Net remove different
types of rainwater. Moreover, this study explores normalization approaches in
image deraining tasks and introduces a new Hybrid Normalization Block (HNB) to
enhance the deraining performance of DEMore-Net. Extensive experiments
conducted on synthetic datasets and real-world MOR photos fully validate the
superiority of the proposed DEMore-Net. Code is available at
https://github.com/yz-wang/DEMore-Net.Comment: 11 pages, 7 figures, 5 table
SVDFormer: Complementing Point Cloud via Self-view Augmentation and Self-structure Dual-generator
In this paper, we propose a novel network, SVDFormer, to tackle two specific
challenges in point cloud completion: understanding faithful global shapes from
incomplete point clouds and generating high-accuracy local structures. Current
methods either perceive shape patterns using only 3D coordinates or import
extra images with well-calibrated intrinsic parameters to guide the geometry
estimation of the missing parts. However, these approaches do not always fully
leverage the cross-modal self-structures available for accurate and
high-quality point cloud completion. To this end, we first design a Self-view
Fusion Network that leverages multiple-view depth image information to observe
incomplete self-shape and generate a compact global shape. To reveal highly
detailed structures, we then introduce a refinement module, called
Self-structure Dual-generator, in which we incorporate learned shape priors and
geometric self-similarities for producing new points. By perceiving the
incompleteness of each point, the dual-path design disentangles refinement
strategies conditioned on the structural type of each point. SVDFormer absorbs
the wisdom of self-structures, avoiding any additional paired information such
as color images with precisely calibrated camera intrinsic parameters.
Comprehensive experiments indicate that our method achieves state-of-the-art
performance on widely-used benchmarks. Code will be available at
https://github.com/czvvd/SVDFormer.Comment: Accepted by ICCV 202
Anchor Retouching via Model Interaction for Robust Object Detection in Aerial Images
Object detection has made tremendous strides in computer vision. Small object
detection with appearance degradation is a prominent challenge, especially for
aerial observations. To collect sufficient positive/negative samples for
heuristic training, most object detectors preset region anchors in order to
calculate Intersection-over-Union (IoU) against the ground-truthed data. In
this case, small objects are frequently abandoned or mislabeled. In this paper,
we present an effective Dynamic Enhancement Anchor (DEA) network to construct a
novel training sample generator. Different from the other state-of-the-art
techniques, the proposed network leverages a sample discriminator to realize
interactive sample screening between an anchor-based unit and an anchor-free
unit to generate eligible samples. Besides, multi-task joint training with a
conservative anchor-based inference scheme enhances the performance of the
proposed model while reducing computational complexity. The proposed scheme
supports both oriented and horizontal object detection tasks. Extensive
experiments on two challenging aerial benchmarks (i.e., DOTA and HRSC2016)
indicate that our method achieves state-of-the-art performance in accuracy with
moderate inference speed and computational overhead for training. On DOTA, our
DEA-Net which integrated with the baseline of RoI-Transformer surpasses the
advanced method by 0.40% mean-Average-Precision (mAP) for oriented object
detection with a weaker backbone network (ResNet-101 vs ResNet-152) and 3.08%
mean-Average-Precision (mAP) for horizontal object detection with the same
backbone. Besides, our DEA-Net which integrated with the baseline of ReDet
achieves the state-of-the-art performance by 80.37%. On HRSC2016, it surpasses
the previous best model by 1.1% using only 3 horizontal anchors
- …